172 research outputs found
The algorithm of noisy k-means
In this note, we introduce a new algorithm to deal with finite dimensional
clustering with errors in variables. The design of this algorithm is based on
recent theoretical advances (see Loustau (2013a,b)) in statistical learning
with errors in variables. As the previous mentioned papers, the algorithm mixes
different tools from the inverse problem literature and the machine learning
community. Coarsely, it is based on a two-step procedure: (1) a deconvolution
step to deal with noisy inputs and (2) Newton's iterations as the popular
k-means
Discriminative variable selection for clustering with the sparse Fisher-EM algorithm
The interest in variable selection for clustering has increased recently due
to the growing need in clustering high-dimensional data. Variable selection
allows in particular to ease both the clustering and the interpretation of the
results. Existing approaches have demonstrated the efficiency of variable
selection for clustering but turn out to be either very time consuming or not
sparse enough in high-dimensional spaces. This work proposes to perform a
selection of the discriminative variables by introducing sparsity in the
loading matrix of the Fisher-EM algorithm. This clustering method has been
recently proposed for the simultaneous visualization and clustering of
high-dimensional data. It is based on a latent mixture model which fits the
data into a low-dimensional discriminative subspace. Three different approaches
are proposed in this work to introduce sparsity in the orientation matrix of
the discriminative subspace through -type penalizations. Experimental
comparisons with existing approaches on simulated and real-world data sets
demonstrate the interest of the proposed methodology. An application to the
segmentation of hyperspectral images of the planet Mars is also presented
Probabilistic Fisher discriminant analysis: A robust and flexible alternative to Fisher discriminant analysis
International audienceFisher discriminant analysis (FDA) is a popular and powerful method for dimensionality reduction and classification. Unfortunately, the optimality of the dimension reduction provided by FDA is only proved in the homoscedastic case. In addition, FDA is known to have poor performances in the cases of label noise and sparse labeled data. To overcome these limitations, this work proposes a probabilistic framework for FDA which relaxes the homoscedastic assumption on the class covariance matrices and adds a term to explicitly model the non-discriminative information. This allows the proposed method to be robust to label noise and to be used in the semi-supervised context. Experiments on real-world datasets show that the proposed approach works at least as well as FDA in standard situations and outperforms it in the label noise and sparse label cases
Theoretical and practical considerations on the convergence properties of the Fisher-EM algorithm
International audienceThe Fisher-EM algorithm has been recently proposed in (Bouveyron2011) for the simultaneous visualization and clustering of high-dimensional data. It is based on a latent mixture model which fits the data into a latent discriminative subspace with a low intrinsic dimension. Although the Fisher-EM algorithm is based on the EM algorithm, it does not respect at a first glance all conditions of the EM convergence theory. Its convergence toward a maximum of the likelihood is therefore questionable. The aim of this work is two folds. Firstly, the convergence of the Fisher-EM algorithm is studied from the theoretical point of view. It is in particular proved that the algorithm converges under weak conditions in the general case. Secondly, the convergence of the Fisher-EM algorithm is considered from the practical point of view. It is shown that the Fisher's criterion can be used as stopping criterion for the algorithm to improve the clustering accuracy. It is also shown that the Fisher-EM algorithm converges faster than both the EM and CEM algorithm
On the estimation of the latent discriminative subspace in the Fisher-EM algorithm
International audienceThe Fisher-EM algorithm has been recently proposed in [2] for the simultaneous visualization and clustering of high-dimensional data. It is based on a discriminative latent mixture model which fits the data into a latent discriminative subspace with an intrinsic dimension lower than the dimension of the original space. The Fisher-EM algorithm includes an F-step which estimates the projection matrix whose columns span the discriminative latent space. This matrix is estimated via an optimization problem which is solved using a Gram-Schmidt procedure in the original algorithm. Unfortunately, this procedure suffers in some case from numerical instabilities which may result in a deterioration of the visualization quality or the clustering accuracy. Two alternatives for estimating the latent subspace are proposed to overcome this limitation. The optimization problem of the F-step is first recasted as a regression-type problem and then reformulated such that the solution can be approximated with a SVD. Experiments on simulated and real datasets show the improvement of the proposed alternatives for both the visualization and the clustering of data
Discriminative variable selection for clustering with the sparse Fisher-EM algorithm
International audienceThe interest in variable selection for clustering has increased recently due to the growing need in clustering high-dimensional data. Variable selection allows in particular to ease both the clustering and the interpretation of the results. Existing approaches have demonstrated the efficiency of variable selection for clustering but turn out to be either very time consuming or not sparse enough in high-dimensional spaces. This work proposes to perform a selection of the discriminative variables by introducing sparsity in the loading matrix of the Fisher-EM algorithm. This clustering method has been recently proposed for the simultaneous visualization and clustering of high-dimensional data. It is based on a latent mixture model which fits the data into a low-dimensional discriminative subspace. Three different approaches are proposed in this work to introduce sparsity in the orientation matrix of the discriminative subspace through \ell_{1} -type penalizations. Experimental comparisons with existing approaches on simulated and real-world data sets demonstrate the interest of the proposed methodology. An application to the segmentation of hyperspectral images of the planet Mars is also presented
Traveling discontinuity at the quantum butterfly front
We formulate a kinetic theory of quantum information scrambling in the
context of a paradigmatic model of interacting electrons in the vicinity of a
superconducting phase transition. We carefully derive a set of coupled partial
differential equations that effectively govern the dynamics of information
spreading in generic dimensions. They exhibit traveling wave solutions that are
discontinuous at the boundary of the light cone, and have a perfectly causal
structure where the solutions do not spill outside of the light cone.Comment: 31 pages, 7 figure
Luc Courchesne : observateur du monde
Catalogue préparé sous la direction de Christine Bernier.Catalogue de l’exposition tenue au Carrefour des arts et des sciences, Université de Montréal,
du 13 avril au 19 juin 2022
- …